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1.  Introduction 


A  stationary  stochastic  process  that  serves  as  a  useful  model  for 
time  series  analysis  is  the  autoregressive  process  with  moving  average 
residuals  {y^ }  which  satisfies 


(1.1) 


s=0 


3  y.  =  y  a . 
s  t— S  -L-  1 


j=0 


j  "t-i  ’ 


t  =  ...  ,  -1,  0,  1,  ...  ,  where  the  sequence  {v }  consists  of  indepen- 
dently  identically  distributed  random  variables.  [See  Section  5.8  of  T. 

W.  Anderson  (l971&)and  Box  and  Jenkins  (l9T0).J  To  avoid  indeterminacy 
(3q  =  01^  =  1  .  (An  alternative  of  specifying  the  variance  of  v^_  to  be  1 
and  leaving  as  a  free  parameter  is  considered  also.)  The  mean  of 

Vj.  is  independent  of  t  and  is  taken  to  be  0  for  convenience.  (Mod¬ 
ifications  necessary  to  account  for  an  arbitrary  mean  are  also  discussed.) 
When  £y^  =  0  ,  the  stationarity  implies 


(1.2) 


£yt  ys  =  a(t-s)  , 


dependent  only  on  the  difference  of  the  indices. 

We  shall  assume  that  the  v^'s  are  normally  distributed,  that  is, 
that  the  process  is  Gaussian.  Then  the  model  is  completely  specified  by 
the  coefficients  in  (l.l)  and  the  variance  of  v  ,  say  . 

L> 

The  statistical  problem  treated  here  is  to  estimate  $,...,  (3  , 

2 

a  ,  ...  ,  a  ,  and  a  on  the  basis  of  a  set  of  observations  at  T 

1  q. 

successive  time  points,  y^,  ...  ,  y  . 
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If  y  =  (y  , 
distribution  II  ( 0.  X ) 


,  y  )'  ,  the  density  of  the  multivariate  normal 
of  y  is 


(1.3) 


7 


5 


where 

(I-1!)  $7t  7S  =  crts  ,  t,  s  =  1,  ...  ,  T  , 

is  the  t,s— th  element  of  X.  If  the  distribution  is  that  defined  by  (l.l)  , 

then  (l.U)  is  (1.2);  the  covariances  are  functions  of  the  parameters 

2 

3n  .....  3  ,  a,  ,  . . .  ,  a  ,  and  a  . 

1  p  1  1 

The  method  of  maximum  likelihood  can  be  considered,  but  in  general  an 
explicit  solution  cannot  be  found.  The  approach  of  this  paper  is  to  modify 
the  model  slightly  so  that  the  derivatives  of  the  likelihood  function  set 
equal  to  0  yield  relatively  simple  equations.  Since  these  equations  are 
nonlinear,  an  iterative  procedure  is  proposed  that  yields  asymptotically 
efficient  estimates  at  the  first  step  (as  T  -*■  00  ) . 

The  estimation  problems  for  the  pure  autoregressive  process  and  pure 
moving  average  process  as  well  as  the  general  mixed  model  are  set  up  in 
terms  of  more  general  multivariate  models.  The  case  of  N  observations 
on  the  vector  y  is  included.  This  work  is  a  continuation  of  earlier  re¬ 
search  on  covariance  matrices  with  linear  structure  by  T.  W.  Anderson  (1969), 
(1970),  (1971b),  and  (1973).  The  iterative  procedures  are  extensions  of  that 
presented  in  the  last  paper,  which  is  essentially  the  method  of  scoring  (as 
pointed  out  to  me  by  J,  N.  K.  Rao) . 
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Durbin  (1959),  (i960)  and  A.  M.  Walker  (1961),  (1962)  have  proposed 
estimates,  hut  they  are  not  asymptotically  efficient  (as  T  ->  ®>) .  Box 
and  Jenkins  (1970)  have  suggested  maximizing  the  likelihood  function  by 
numerical  means . 

The  covariance  sequence  (l.2)  of  a  stationary  process  has  a  spectral 
representation.  In  the  case  of  an  absolutely  continuous  spectral  distri¬ 
bution  function 


ti r 


-TT 


(1.5)  a(h)  = 

The  spectral  density  f(A)  may  be  determined  by 

(1.6) 


f(A)  cos  A  h  dA  ,  h  =  0,+ 


f(A)  =  2^  I  cr(h)  cos  Ah 

h=-oo 


when  the  series  on  the  right-hand  side  converges.  In  the  case  of  model  (l.l) 
the  spectral  density  is' 


(1.7) 


r=0 


y  a.  e^ 

L  J 


6  e 
r 


iAr 


2  ' 


Clevenson  (1970)  and  Parzen  (1971 )  and  Hannan  (1969)  have  proposed  estimation 
methods  based  on  the  sample  spectral  density  (the  so-called  periodogram) .  The 
relationship  between  these  methods  and  the  ones  presented  in  this  paper  will 
be  explicated  in  a  later  paper. 

If  we  let  (l.l)  be  u^  ,  the  spectral  density  of  the  stationary  process 
{ut}  is 


4 


(1.8) 


f  (A) 

u 


a2  r  iAj  r 

l  a.  e  la. 


-iAj 


2*  j=0 


=  an(h^  e 

kTT  ,  ^  U 

h=-q 


J=0 

iAh 


where 


(1.9) 


0u(h) 


-  a2  *?' 


k=0 


“k  ak+|h| 


,  h  —  0 ,  +1 ,  •  •  •  ,  +q  , 


are  the  nonzero  covariances  of  {u^.}  •  The  parameters  a-^,  ...  ,  ,  and. 

2 

a  can  he  replaced  hy  0^(0),  0^(1),  ...  ,  a  (q)  .  We  shall  assume  the 
roots  of 

(1.10)  M(z)  =  ^  a. 

j=0  J 

are  less  than  1  in  absolute  value.  Then  given  o^CO),  0^(1),  ...  ,  a  (q)  #  0 
a  (h)  zh  can  he  factored  uniquely  into  M(z)M(z  1),  thus,  defining 

p 

a^,  ...  ,  ,  and  a  .[See  T.  W.  Anderson  ( 1971a)  and  ( 197Tb)  for  details.] 

Estimation  of  the  pure  moving  average  model  in  terms  of  a(0),  a(l),  ...  ,  a(q) 
was  treated  hy  T.  W.  Anderson  ( 1971b),  (1973). 
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2.  Estimation  of  Coefficients  of  Linear  Transformations  to  Approximate 
Autoregressive  Processes 

2.1  A  General  Linear  Transformation.  Suppose  y  is  a  T-eomponent 
random  vector  defined  by 


P 

(2.i)  l  6  K  y  =  v  , 

z=o  *  ~  ~ 

where  K^,  K^,  ...  ,  are  p  +  1  known  linearly  independent  T  x  T 

matrices,  Sq  =  1  and  ...  s  ftp  are  p  parameters  such  that 

VP 

Z,£=0  $£  ^  is  nonsingular;  we  assume  that  there  is  at  least  one  such 
set.  Suppose  v  is  a  T-component  random  variable  with  mean  vector 
%  v  =  0  and  covariance  matrix 


(2.2) 

Then 

(2.3) 

has  mean  vector  ^y  = 

(2.4) 

£(y)  =  £yy'  = 

with  : 

inverse 

(2.5) 

^.~1(y)  = 

a  : 

£*(v)  =  J'w'  *  a  I 


P  \  -l 

y  =  I  I  Bn  K  1  V 
\  £=0  *  * 

0  and  covariance  matrix 

-1 


U=0 


[k=0 


=  a 


l  h  Ei 


k,Z=0 


K„ 


St  /  %  ~Z  ~  2  E  Bk  St  • 

£=0  o  k,&=0 


Let  y  ,  ...  ,  y^  be  N  observations  on  y  ,  and  let  L  denote  the 


likelihood  function,  when  y  has  a  normal  distribution.  Then 
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(26)  log  L  =  -T  log  2tt  -  T  log  a  +  2  log  |  £  g.  K 


1=0 


N 


wa  a=l  \k=0 


a 


l  ~V 


h  ll  la 


=  -T  log  2tt  -  T  log  o+2  log  |  l  K£i 


£=0 


where 


V tr  1  3,  g£  k;  k  c  , 

0  k,£=0  ~K  ~ 


(2.7) 


N 


C-  "  »  l  la  Ik  > 
a=l 


and  tr  denotes the  trace  of  the  matrix  that  follows.  To  find  the  partial 


derivatives  of  (2.6)  with  respect  to  g^,  .  ..  ,  g  we  use  the  results 


(2.8) 


9log|A|  1  3 | A | 

90  =  ]aT  99 

1  ?  3|A|  9a±i 

=  W  ij-i  3aij  36 


9a..  . 

cof  a.  -  ^ 


T?Ti,k  COJ  ^ 


. .  9a.  . 


i,j=l 


98 


=  tr  A-1  —  A 
tr  t  99  ~  ' 


(The  cofactor  of  a. .  in  A  is  denoted  by  cof  a. . . )  Then 

ij  ij 


(2.9) 


9  2 


i-l 


s 


log  L  =  2  tr 


k=0 


ek?k 


N 


2  *-* 

Na  a=l  k=0 


a 


7 


V?tr  l^V-’ 

&  =  1,  ...  ,  p  . 


(2-10>  r?  I  1os  L  -  -  v +  v  tr  I  %  et  x  5i  2 

3a  a  a  k,£=0 

If  N  =  1  and  y  =  y  ,  the  derivatives  (2.9)  are 


-1 


(2.11)  2  tr 

and  (2.10)  is 


k=0 


P 


(2.12) 


-  h*  V  l  n  3k  hi'  &  &  ? 

O  O  k  ,Jt=0 


The  maximum  likelihood  estimates  may  be  defined  by  setting  the  derivatives 
equal  to  0.  [By  the  argument  used  in  T.  W.  Anderson  (1970 )  it  follows  that 
there  is  at  least  one  relative  maximum  defined  by  the  derivative  equations.] 

The  derivative  equations  are 

(2.i3)  tr(jo  5k)  h  =  £2  k^0  ^k  tr  5k  h  2  ’ 


k,x-=0 


(2.11+) 

We  can  develop  these  equations  in  an  alternative  way  by  letting 


(2.15) 


5k  la  =  la' 


k  =  0 ,  1 ,  ...  ,p,  a 


=  1 


>  » 


Then 


N 


log  L  =  -  T  log  2ir  -  T  log  a  +2  log  |  |  3p  K 


£,=0 


l  ~V 


N 


2  l  I  h 

Wa  a=l  \  k=0  K 


(k) 

a 


f  B  vU) 

l  pp  y 
1=0  *  ~a 


=  -  T  log  2tt  -  T  log  a  +  2  log 


1=0 


h  h 


3’  M  3  , 


(2.16) 


8 


where 


(2.17) 


8  = 


VJ 


(*■ 


(o) 

■a 


N 


(2-18)  H=f  ! 


a=l 


.(1) 


~a 


r(p) 


-a 


Xa 


(0) 

y, 

(0) 


-a 


(0) 

y 

~a 


Xa 


(0)' 

a 

(D* 


-a 


y 

ia 


(1) 

y 

(1) 


-a 


(p)  (l) 

y  y 

~a  ~a 


y 

ta 


(0)’  (p) 


\ 


(l)'  (p) 

y  y  ^ 

~ct  ~a 


y 


(p)'  JP) 


-a 


-a 


The  partial  derivatives  of  (2/N)  log  L  set  equal  to  0  can  he  written  in  terms 
of  the  elements  of  M  as 


(2.19) 

(2.20) 


tr 


lk=0 


^ 

aO  1  aI  /\ 

a  =  ^  tr  3  Mg  ; 


1  ~ ' 

^2  8  M  , 
a 


the  left-hand  side  of  (2.19)  denotes  a  row  vector  with  the  i,-th  component 
given  explicitly. 

If  N  >  1  and  ^y  =  y  ,  where  y  is  an  arbitrary  vector,  then  the 
sample  mean 

N 


(2.21) 


1 
x 


N  ^  la 

v=i 


is  the  maximum  likelihood  estimate  of  y  ,  and  in  the  likelihood  equations 
(2.13)  and  (2.1^),  C  should  be  replaced  by 

N 


(2,22) 


5  =  w  l  (y^-y)(y-y)'  • 


ct=l 


a  ~  ~a 


In  some  models  one  wants  £y .  =  y  ;  that  is,  £y  =  ye  ,  where 

.1  *  ~  ~ 
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£  =  (l,  1,  ...  ,  l)'.  Then  2/N  times  the  logarithm  of  the  likelihood, 
function  is  (2.6)  with  C  replaced  by 

N 


(2.23) 


C*  =  I  l  ka  ~  ^  ~ 


a=l 


The  derivative  of  2/N  times  the  logarithm  of  the  likelihood  with 


respect  to  y  is 
3  2 


N  N 

(2-2i°  377  I los  L  =  £l  E  ^k  %  K£  K  X  (y  -ye)  . 

3(i  N  W02  ~  kj£=Q  k  5L  ~k  a=1  ~a  - 


If  £  is  a  characteristic  vector  of  K,_,  K  ,  ...  ,  K  ,  then 

~  ~0  ~1  ~p 


(2.25) 


1  N 

^  =  NT  S'  Xct  ’ 


and  in  the  other  derivative  equations  C  is  replaced  by 


(2.2 6) 


N 


N  Ji  (Xa  ~  y  £  }  {la  ~  M  • 


If  £  is  not  a  characteristic  vector  of  KQ,  K^,  ...  ,  ,  then  usually 

(2.25)  will  not  be  the  maximum  likelihood  estimate  of  y  . 

The  second  derivatives  of  (2/N)  log  L  defined  by  (2.6)  are 


(2.27) 


36,  K 


log  L  =  -2  tr 


k=0 


-1 


K. 


(2.28) 


38.  3g 

tJ 


3  _  2 

2  N  ^ 


(2.29)  — |  log  L 

(3a2)2  N 


~~2  tr  K ^  C  ,  j,  1=1, 


1 Ttr  Z  3tK!K  C,  3=1, 

a  Z=0  *  ~J  ~ 


TT  "  \  tr  L  3k  3£  Kk  K£  C  • 
a  a  k  ,£=0  *  z  ~x' 
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The  elements  of  the  information  matrix  are  N  times 


10 


U-30)  Z g^  gg  N  l0S  L 

<]  ^ 


£ 


+  tr  I  ek  ^ 

i k=0  k  ~k 


K~j  ~£ 


-1 


k=0 


3V  K 
k  ~k 


(2.31) 


-P 


83.  8a 

0 


2  N 


log  L  =•— -g-  tr  l 

a  k=o 


K. 


J,A=- 1* 

J  =  1, 


(2.32)  -g 


(8a2)  W 


log  L  = 


T 

2?" 


As  N  °°  ,  the  normalized  maximum  likelihood  estimates  have  a  limiting 
normal  distribution  with  covariance  matrix  whose  inverse  has  elements  given 
by  (2.30),  (2.31),  and  (2.32). 


2.2  Autoregressive  Process  Approximated  by  a  Linear  Transformation. 
The  autoregressive  process  {y^}  is  (l.l)  for  a-^  -  ...  =  =  0  , 

that  is. 


(2.33) 


t  =  .  .  . 


K  =  LS 


(2.34) 


P 

l  3S  yt_s  =  vt  , 

s=0 

t 

,  -1,  0,1,  ...  .  Let  y  =  (y  , . . . ,y  )  .  Then  the  distribution  of 

~  -L 

,  y^  is  approximated  by  the  distribution  of  y  defined  by  (2.l)  when 


! 
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Then 


(2.35)  L2 


0  0 
0  0 
1  0 


In  general  L®  has  all  O's  except  for  l's  g  units  below  the  main 
diagonal.  We  suppose  p  +  1  <  T.  Note  that 


(2.36) 


IS 


g ,  h— 0  5  1  ,  ...  5 


(2.37) 


Ls  =  0  , 


g  =  T,  T+l, 


In  this  case 


0  0 
1  0 


1 


■J  o 
p-2 


M  1 

P-1 


0  0 


0  0 
0  0 
0  0 


1 


0 

1 


0  0 


0 

0 

0 

0 

0 


which  is  triangular  with  O's  above  the  main  diagonal  and  has  determinant  1. 
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The  components  of  (2.1)  are 


t-1 


(2.40) 


l  K  y+.-c = 


3=0 


■>s  J't-s  vt  *  ’  •••  »  P  » 


(2. 4l) 


4Ssyt-s=V  t=p+l,  ,  T 
s-0 


The  equation  *.2.4l)  agrees  with  the  autoregressive  process  (2.33),  hut  the 
equation  (2.40)  is  such  that  the  sequence  y  ,  ...  ,  y^  does  not  start  out 
as  a  stationary  process.  An  alternative  way  of  considering  the  equation 
(2.40)  is  that  (2.4l)  holds  with  y^  =  y_^  =  ...  =  y  ^  ^  =  0  . 

In  this  model  we  are  often  interested  in  N  =  1  and  y  =  y.  Then 


\ 


(2.42) 


(k)  „  Tk  1  0 

i  =  5k  z  =  l  z  =| 

yn 


J 


T-k 


k=0 ,  1 ,  ...  ,  T-1  5 


where  there  are  k  0's,  and 


(2.43) 


L* 


y  =  0  ,  k=T ,  T+l, 


Since  3^  ^  is  triangular  with  0's  above  the  main  diagonal,  then 

(^0  is  triangular  with  0's  above  the  main  diagonal,  and  the 

determinant  of  I^_q  3^  is  1.  [The  diagonal  terms  of  (I^_q  3^  L^) 
are  0,  £=1,  ...  ,  p  .]  Then  the  derivative  of  2/N  times  the  logarithm 
of  the  determinant  with  respect  to  3^  is 


(2.44) 


l  -  1, 


P 
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The  derivative  equations  (2.13)  can  be  written  in  this  case  as 
(2.45)  ?  ft,  y(k)'  yU)  =  -  y  (0)  '  y(£)  ,  £=1,  ...  ,  p 


k=l 

In  components  these  are 

p  ^  T  T 

(2.h6)  l  6^  I  y  y  ^  ■-  I  j,  y  z  , 

k=l  t=l 


Z=1 ,  ...  ,  p 


t=l 


where  yQ  =  y_x  =  ...  =  y_ 


(p-1) 


=  0  .  These  are  the  usual  maximum  likelihood 


estimates  of  ft.  .  ....  ft  for  initial  values  y^  =  y 

1  P  0  -1 

or  the  "least  squares  estimates”  since  they  minimize 

\2 


=  y 


-(p-D 


=  o 


(2.1+7) 


T 

l 


ek  yt-k, 


t=l\k=0 

[See  T.  ¥.  Anderson  (1971) 5  Sections  2.2  and  5.4,  for  example.] 
Let 

T-h 

l 

i=l 


(2.48) 


"h 


x  T-n 

=  T  X  yi  yi+h  5 


h=0,  1 . T-l  , 


The  right-hand  side  of  (2.46)  is  -Tc^  •  The  sum 


(2.49) 


T 


X  yt-k  A-* 


differs  from  Tcl  by  omission  of 


(2.50) 


T-  k-JU 


t=T-max(k,£ )+l  ^  7h+|h-ii| 


These  terms  can  be  added  to  the  coefficients  so  as  to  make  the  equations 
agree  with 


(2.51) 


/\ 


c 


g-f 


C-£  ’  1 9 


9  P  • 


14 


[See  T.  W.  Anderson  (l971a),Sec.  5*6,  for  example.]  Then  the  estimates 
derived  from  (2.5l)  are  the  coefficients  of  a  stationary  process.  [See 
Anderson  (l97ic ) ,  for  example  .  ]  If  we  let 

r  o  i 


(2.52) 


;(*)  - 


0 


»  k-0 515  ...  *  p  * 


where  the  first  k  components  are  0  and  the  last  p-k  components  are 
0  ,  then  (2.51)  can  he  written 


(2.53) 


k=l 


d  ~ (k) '  ~(£)  ~(0)’  ~  (£)  „  , 

Sk  y  y  =  -  y  y  ,  £=i,  ...  ,  p 


In  this  case  of  =  L  the  elements  of  the  information  matrix 
are  N  times 


<2-5l,)  ae.33B.  f1os  l  =  tr|Jn  ^  i 

J  £  \  k=0 


(f 

\k=C 


-1 


L,J  4  ? 

~  \k=0 


-1 


2 

(2.55)  -0  ----  -  -2  5- log  L  =  0  , 


30.  3a 
J 


^  1  )  •••  9  P  ) 

5 


and  (2.32). 
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It  is  of  interest  to  compare  the  covariance  matrix  of  y  defined  by 
(2.1)  with  that  of  T  terms  from  the  stationary  process  defined  by  (2.33). 
For  p=l  and  (3^=$  the  covariance  matrix  of  the  stationary  process  is 

1  -b 

-3  i  -3 

32  -6  l 

(-3)T_1  (-a)T“2  (-3)T~3 


and 


c  a2 

(2. 57)  y(y)  = 

/  i-32 

f  -b(i-b2) 

B2(i-32) 

-6(i-32) 

1-34 

—6(1— 3^) 

32(i-B2) 

-3(i-$4) 

1-36 

(-3)T_1(i-32) 

(-3)T~2(i-34) 

(~3)T-3(i-36) 

i-e 

(-6)T"2(i-B4) 

• 

• 

(-e)T_3(i-e6)  ... 

* 

• 

i-S2T  f 

For  s.  stationary  process  J  3 1  <  1  5  and  hence  the  i,j— th  element  of  ^a(y)  is 
close  to  the  i,j— th  element  of  (2.56)  if  i  and  j  are  large. 


1 6 


3.  Estimation  of  Coefficients  of  Linear  Transformation  to  Approximate 
Moving  Average  Processes 

3.1  A  General  Linear  Transformation.  Another  model  is  defined  by 


(3.1)  y  =  l  \  J  v  , 

~  k=0  *  ~ 

where  J^,  J^,  ...  ,  are  q  +  1  known  linearly  independent  T  x  T 

matrices,  =  1  ,  and  ,  ...  ,  a  are  q  parameters  such  that  a£ 

is  nonsingular;  we  assume  that  there  is  at  least  one  such  set.  Suppose  v 

.»  2 

is  a  random  vector  with  mean  vector  ^v  =  0  and  covariance  matrix  p  w'  =  a  I 
Then  the  mean  vector  of  y  is  £y  =  0  and  the  covariance  matrix  is 


(3.2)  t(y)  =  £ yy'  -  o2  ^  ^  Jjt  J’  =  a2  (  j 

If  L  denotes  the  likelihood  function,  then 


0 


.1  JA  • 


(3.3)  S-  log  L  =  -T  log  2ir  -  T  log  a  2  log  I  V  a  J.  I 

k=0  k 


Nar2  oc=l 


N  /  q  \  -1  /  q 

l  la  ^  “k  £k)  £ 

36=1  a  \k=0  J  \z=0 


!-l 


a£  tl. 


~a 


=  -T  log  2it  -  T  log  a2  -  2  log  |  £  J. 


k=0 


.k1 


-  T  trl  l  «k 
a  \  k=o 


-i 


a£  ?£  S  • 


We  use  the  result  that 


(3.4) 


—  A-''-  =  -  A  ■*"  —  A  A  ■*" 

30  ~  ~  30  .  . 


,-l 


which  follows  from  differentiating  A  A  =  I 
(2/N)  log  L  are 


The  partial  derivatives  of 


(3.5) 


3  2 


3a. 


f  log  L  -  -2  tr  (  l  a  ^  |  Jj 
i  n —  u 


IT 

-l 


-l 


+  ^tr|  l  a  J£  C  l  a  J'  J’  J  a  J 
a  \SL=0  ~  £=0  ~x'  £=0  *  ~ 


j-1  »  •  •  ’  5  *3. 


(3.6) 


3  2 


T  .  1 


-1  /  q. 


3a 


H  L  =  -  —  +  T  tr  1  l  L  -£  x £ 

a  a  \  k=0  £=0  * 


l-l 


I  «p  Jo  C  • 


The  likelihood  equations  can  he  written  [  with  the  second  term  on  the  right-hand 
side  of  (3-5)  transposed] 


(? 

\k=C 


,-1 


-1 


(3.7)  trl  l  a  j  J  =  ^  tr  I  a  J  J.  I  ap  Jp  I  C|  )[  a„  J' 
k=0  k  ~k|  a  \  £=0  *  ~J  £=0  “ 


(3.8,  c. 


q  ~  -i 

?! 

^£=0 

j —i »  • • •  >  q  j 


-i 


The  second  partial  derivatives  of  (2/N)  log  L  are 


(3.9) 


32  2 


3a.  3a.  N 
i  J 


o~  \£=0 

t 

2 


-1 


\?k|  £i  X“£?£ 


£=0 


-1 


-1  /  a  K  /  q  \-i 


tr  i  a£  ?£  Ji  I  a£  J£  c  I  a5  J'o  j;  I  ac  Ji 
\£=o  x  .  \£=0  k  ~  £=0  1  I  ~‘J\  p.=n  1 


£=0 


-1 


2  tr  1  a£  J£  C  Z  a£  Jo 
a  \ £=0  XJ  (£=0  * 


\-l  /  q  \-i  /  q  \-l 


i  £=0 


2  tr  L  «£  £o 
a  £=0  k 


J  “  •  •  s 


q  , 


i*  J  54  1 
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<3'11)  77^2  ff  1os  L  =  TT 

a  \d  )  O 


O 


/  a 

-1 

a  \ 

\  k=0  K  kj 

Hd  ? 

8* 

o 

X  II 

l-l 


The  information  matrix  has  elements  which  are  N  times 


(3.12) 


(3.13) 


a„ 


(3.1U) 


/« _ 8 

'  8 (a2) 


W  lo&  L 


T 


As  N  00  ,  the  maximum  likelihood  estimates  have  a  limiting  normal  distribution 
with  covariance  matrix  whose  inverse  has  elements  given  by  (3.12),  (3.13),  and 
(3. 1^). 


The  likelihood  equations  (3.7)  and  (3.8)  cannot  in  general  be  solved  expli¬ 
citly.  However,  the  method  of  scoring  can  be  used.  If  L(y|0)  is  the  likelihood 
function  of  a  vector  parameter  0  ,  the  Taylor's  expansion  of  the  (vector) 
derivative  is 


(3.15)  g|-  log  L (y 1 0 ) 


a 

90 


log  L(y | 0 ) 


+ 

0=9* 


92log  L(y | 0 ) 
90  90' 


(0-0*)  +  R(y 1 0  ,0* )  . 

e=e*~  ~  ~  ~  ~ 
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2  i 

The  matrix  (9/30  30')  log  L  (y|0)  will  be  close  to  its  expected  value, 
which  is  a  function  of  6  ,  taken  to  be  the  "true"  value  of  the  parameter 
vector.  Under  certain  conditions  if  6*  is  a  consistent  estimate  of  the 
"true"  value,  the  solution  to 


(3.16) 


32log  L(y|0) 

30  30' 

(0-0*)  =  —  log  L(y|0) 

0=0* 

0=0* 


is  a  consistent,  asymptotically  efficient  and  asymptotically  normal  estimate 
of  0.  The  procedure  can  be  iterated;  in  suitable  circumstances  the  sequence 
of  vectors  will  converge  to  the  maximum  likelihood  estimate,  that  is,  a  solu¬ 
tion  to  the  left-hand  side  of  (3.15)  set  equal  to  0  . 


In  the  present  case  let  a 


(0)  ~(0)  ~2 

,  ,  ...  ,  a  .  a.  be  a  set  of  initial  esti- 

1  q  *  0 


^ (i  )  ^(i )  ^2 

mates,  and  let  a,  ,  a  ,  a.  be  the  solution  to  the  i-th  set  of 

1  5  q  i 

equations.  It  will  be  convenient  to  let 

<3-1T>  Li  * ,  l  ^i_1>  i 


k=0 


.k 


Then  the  i-th  iteration  involves  the  equations 
(3.18) 


1 

I 


^•—1  ^—1 
tr  A.  J  A.  J  +  tr  A.  ,  J  J'.  A 

~i-l  ~g  -l-l  -j  -i-l  -i- 


-1 

1 


~(i)  •'(i-l ) 

J  aJ 


+  7q~  tr  AT1,  J 

h-i  -1'1  ~s 


~2  -2 
a.  -  a.  , 

,  i  i-I 


i-1 


=  -  tr  A  J  + 

-i-I  -g  -2 


1  ^--1  All 

tr  A  \  C  A  f  J' 

~i-i  -  ~i-l  -g  _  _ 


i-i 


g  =  i,  . . . 


9 


;  >> 
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(3.19) 


ai-l  J 


J  tr  AT1^  J.  I  a?1'  -  af1 

4i  -1-1  \  j  j 


-1)  \+  T  a2  _  .2 

2°i-!  1  1 


rp  1  a  _1  a  n 

__i —  +  -4 — tr  A!  x  A.x  C 
_a2  ~i-l  ~i-l  - 

2a.  .  2a.  n 

1-1  1-1 


These  reduce  to 


(3.20) 


j=l 


A_[l  A.— I  A— ."I  A  — 1 

tr  A. x  J  A.  .  J.  +  tr  A.  .  J  J'.  A!  _ 

~i-l  ~g  ~i-l  -j  ~i-l  ~g  -j  ~i-l 


a . 

J 


(i)  + 


-2 

°i-l 


>1 

tr  A.  J  a. 

~  1  "“1  ~  1 


An  1  a_i  A  A  _n  A_1  A_n 

=  2  tr  A..  J  +  — —  tr  A. \  C  A!  T  J'  A!  7  -  tr  A.  ,  J  A. x  Jn 
-i-l  ~g  ~2  ~i-l  -  ~i-l  ~g  -i-l  ~i-l  ~g  ~i-l  -0 

l-l 


tr  A.  J  J'  A!  X 

-i-l  ~g  -0  -i-l  » 


g  1  5 


(3.21) 


j=l 


,  ~-l  T  -'(l)  .  T  ~ 2 

tr  A.  .  J.  a:  +  — o —  °n- 
-l-l  -j  j  „~2  l 


2a 


i-l 


T  + 


tr  A.'  ^  A.1..  C  -  tr  A"  J 
-i-l  -i-l  -  -i-±  - 


-1 


*2 

2°i-l 


If  a  =  1  and  is  a  free  parameter  (not  specified),  the  likelihood 

2 

satisfies  (3.3)  with  a  =  1,  the  first  partial  derivatives  are  (3.5)  for 
j  =  0,  1,  ...  ,  q  ,  the  elements  of  the  information  matrix  are  N  times  (3.12) 
for  i,  j  =  0,  1,  ...  ,  q  ,  and  the  equations  for  scoring  are 


(3.22)  f 
j=0 


a_1  A_n  a _ n  a 

tr  A.  .  J  A.  .  J,  +  tr  A.x  J  J!  A'.-l 

~i-l  ~g  -i-l  -j  -i-l  -g  ~j  -i-l 


A(i)  -'(i-l) 
a.  -  a: 

J  J 


tr  A  t  J  +  tr  A. \  C  A!  r  J'  A.  , 
-i-l  ~g  -i-l  -  -i-l  -g  -i-l  ’ 


g  0,1,  ...  ,  q  , 


These  reduce  to 


(3.23) 


j=0 


tr  A?1-,  J  A.1,  J.  +  tr  AT1,  J  J!  1 
~i-l  ~g  ~i-l  ~J  ~1-1  ~g  ~J  ~i-l 


a. 

J 


(i) 


a— 1  A  — 1  .  A  — 1 

=  tr  A.  J  +  tr  A. \  C  A!  ,  J'  A!  ,  , 

~i-l  ~g  ~i-l  ~  ~i-l  ~g  -i-l 


=  0,  1, 


3.2  Moving  Average  Process  Approximated  by  a  Linear  Transformation. 

The  moving  average  process  {y  }  is  (l.l)  for  3,  =  . .  .  =  3  =  0  ,  that  is, 

"C 

(3.24)  y  =  l  a  v  , 

j=0  J 

t  =  ...  ,  -1,  0,1,  ...  .  Then  the  distribution  of  y^,  . . .  ,  y^  is 

approximated  by  the  distribution  of  y  defined  by  (3.1)  when  J  =  Lg  , 

-  ~g  ~ 

g  =  0,  1,  ...  ,  q  .  The  components  of  (3-1)  are 

t-1 

y+  =  l  »  t  =  1  , . . .  ,  q  , 

t  j=0  3  3 


y  4-  /,  V,  ,  5  D  •  •  •  5  ^  • 

T  j=0  0  ^ 


(3.25) 

(3.26) 


The  equations  (3.26)  correspond  to  a  moving  average  process;  the  moving 
averages  of  the  first  q  observations,  represented  by  (3.25)  >  are  truncated. 
The  covariance  matrix  of  the  moving  average  process  defined  by  (3.21+)  is 


(3.27) 


?  2  2 
L  o  a.  I  + 
j=o  3  ~ 


I  l  a 

i=l  j=0 


2  a.  a  (L1  +  L  1 ) . 

J  J+i 


This  is  of  the  form  considered  in  T.  W.  Anderson  (1969),  (1970),  (1971^),  and 
(1973)  ,  namely 
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(3.28) 


)  a  G  , 
g=o  s  “g 


where  Gn  =  I  and 
~  u  ~ 


(3.29) 


(LS  +  L  g) 


“  1  ,  ...  ,  q 


The  covariance  matrix  of  y^  ,  ...  ,  y^  defined  hy  (3.25)  and  (3.26)  is 
for  q  =  2  ,  for  example. 


a. 


(3.30)a 


0 

a0ai 


a0a2 


“o0! 


OUOU 


2  ^  2 

aQ  +  a;L  “o®!  +  °qa2 


aQa2 


2  2  2 

a0a^  +  a  a 2  aQ  +  +  a^oig 


a0ai  +  aia2  %+  4  +  a2 


0 

0 


0  • 


2  2  2 
...  aQ  +  a-j  +  a2 


This  matrix  differs  from  (3.27)  for  q  =  2  in  the  Tipper  left-hand  2x2 

submatrix  in  (3.30).  If  T  is  large  relative  to  q  the  difference  between 

the  two  models  will  not  be  important;  the  model  (3.1)  with  J.  =  L'-1  can  be 

J 

considered  as  an  approximation  to  the  moving  average  process. 


When  J .  =  LJ  , 


(3.31) 


tr  I  £  an  ii  i  =  tr  (  I  ao  4\  4  =  0  ’  J  =  1» 


& 


.  £=0 


,A=0 


q 


j  3  '  ^ 

tr  1  Jo  a* 


io  =  tr 


\  1=0 


5 


(3.32) 
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The 


A  A  A^- 

likelihood  equations  (3 -7)  and  (3.8)  for  a.,  ...  >  a  and  a 

Q. 


(with  ao  ~  i)  are 

<3-33)  trll  ( I 


a  .  n 


o2  =  FfT  tr  I  l  a.  L 'J  a  L~|  C  . 

1  1  k=0  *  ~  0  1 


(3.34) 

The  method  of  scoring  leads  to 


U=0 

-1 


az  L =  0  ,  g  =  1, 


.£ 


(3.35) 


j=l 


tr  a:1,  ifi  L'j  i'.-]  sj1*  -  a!i_1> 

~i-l  ~  ~  -l-l  \  j  J  i 


a 


1  ^—1  ^  — 1  n-  ^ '  _1 

—  ^  6ill  ^i-l  ^  *  «=1» 

i-1 


(3.36) 


^ii 

2ai-l 


(S2  -  S2 

\  i  l-i 


T  1  ^'-1  a-1 

+  — — r tr  A.  7  A.x_  C 

2S4  -1-1  -1-1  ~ 


^2 

2af  .  . 

i-I  i-I 


These  can  he  written 


(3.3T)  f  tr  A?1.  L®  l*  f 7  S'11  = 

.  ~  1~_L  ~  ~  ~  1  -i-  1 

j=i 


/v— T  s*  ^  *  —1 

-  tr  A. x  LS  A.  r 
-l-l  -  -i-I 


1  ^  — 1  A  ^  ^_"J  *  rr 

+  tr  ^i-1  2  ^i-1  2 

ai-l 


g  1,  ...  >  q  5 


(3.38) 


1  ~  ’  -1  /N_ 1 
a:  =  4  tr  A.  7  A.  C  • 

l  T  -l-l  ~i-l  - 


The  set  of  linear  equations  (3.37)  are  solved  for  a,^,  ...  ,  a^. 

1  q 

2 

If  the  parameters  are  a^,  a^,  ...  ,  (a  =  l),  then  the  likelihood 

equations  are  (3-33)  for  g  =  Q,  1,  ...  ,  q  . 


The  equations  for  scoring  are 


t  » 
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(3.39)  itr  A-2_±  +  tr  aT^  aA 


/s_1  ,, 1  ^'—2 

=  -  tr  A.\  +  tr  A.  ,  C  A,  .  , 

~i-l  ~i-l  -  ~i-l 


(3-UO) 


3=0 


-'-l  s  '  i  A,-l  /*(•?  ) 
tr  A.  .  Ls  L  J  A.  ,  ot. 
i-I  i-l  \  j 


^  ( i  —1 )  T  ^  *  _  1  g;  *  1  — 

-  a:  =  tr  A.  n  C  A.  n  L  8  A,  ir  , 

j  -i-l  -  -i-l  -  -i-i 


1)  •••  5  Q.  • 


These  reduce  to 


,  ,  .  I  A-1  A’  — l\/v(i  )  &  ^_1  .-( 

(3.4l)  I  tr  A.  +  tr  A.  A.  .  )  ol' '  +  )  tr  A  L  A  a: 

1  ~i-l  ~i-l  ~i-lj  0  ~i-l  -  ~i-l  j 


i) 


~_2  •'(i-l)  *—1  •',_2 

*  tr  ii-1  “o  +  tr  iiix  £  ii-1  • 

(312)  f  tr  aT1  Lg  l'J  a!*)  a!1'  =  tr  A?1  C  a!-)  l'g  A.'-)  , 

._Q  -l—l  ~  -  -l—l  J  -*,1—1  -  ~1— l  ~  ^l—l 

g  —  1  ,  ... 

These  form  a  set  of  q  +  1  linear  equations  in  q  .+  1  unknowns. 

If  N  =  1  and  y  =  y  5  then  C  =  yy*  and 

(3.43)  tr  A.^  C  A._^  L  S  A._^  =  y'  A._^  L  8  A,  t  A,  '  y  ,  g  =  0,1,. 


n..  -  jn. .  _ 

-i-l  ~i-l 


The  equations  (3. 37)  and  (3.38)  are  then 

i’  -d)  fe-ig)'£-iie(5-id 


(3.44) 


3=1 


^—1  cr  ^—1  1 

tr  A. \  Ij  A.\  Ld 
~i-l  -  ~i-l  - 


-2 

ai-l 


'_1  g  /*'_i 

-  tr  A.^  LS  A._^  ,  g  =  1, 


5  1  5 
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(3.45) 


-2  1  *-l  V  --1 

ai  =  TUi-lX  * 


The  calculation  of  A.  n  y  can  be  done  by  solving 

♦vi-1  ~ 


(3.46) 


£=0 


-(i-i)  Ta 

aSL  i  ~  =  l  ‘ 


The  matrix  of  coefficients  has  the  form  (2.38)  (with  replaced  by 

al  l  =  1,  ...  ,  q).  The  component  equations  are  z^  =  y^  , 


(3.47) 


t-1  , .  n  V 

,  v  /'(i-l) 

z.  +  )  a  z,  =  y .  . 

t  L  s  t— s  •'t 

S=1 


t  =  2  5  ...  ,  q 


(3.48) 


zt  +  4  Ssll}  Zt-s  =  yt  ’  13  =  <1+1’  ’  T  • 

s=l 


These  can  be  solved  successively  for  z^,  ...  ,  z T  .  Each  component  z^ 
involves  at  most  q  multiplications  and  the  entire  solution  less  than  qT 
multiplications . 


The 


first  column  of  A.  can  be  obtained  by  solving  (3-46)  with  y 
~  1  — ~ 


replaced  by  the  first  column  of  I.  Thus  zn  =  1  and  the  successive  cal- 

~  _L 

culations  are 


(3.49) 


t-1 


=  -  I  « 


(i-1) 


s=l 


t-s 


t  2 ,  ...  slj 


(3.50) 


£  /'(i-l) 

z  —  —  )  a  z,  , 

t  L,  s  t-s  ’ 

S=1 


t  =  q+1 ,  ...  ,  T  . 


The  (j+l)-th  column  of  A.  is  simply  LJ  times  the  first  column;  that  is  , 


it  is  the  first  column  displaced  by  j  units  for 


) 
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(3.51) 


M 


W 


\ 


0 

z„ 


JT-j  / 


j  =  1 s  ...  ,  T-l , 


(3.52) 


LJ  z  =  0  , 


j  =  T ,  T+l ,  ...  . 


Thus  the  calculation  of  L&  involves  less  than  Tq  multiplications. 

Q  1 

Another  way  of  looking  at  the  calculation  of  ( 3^_g  L  )  ,  where 

^(i-1 ) 

we  drop  the  carat  and  superscript  on  for  convenience  is  to  see  that 


(3.53) 


I  = 


o  T“1 

ol,  n  y  6.  LJ 
L  j  ~ 

£+j 


2 

£=0  "  j=0 

a  T-l 

1  I  a£  6i  L 

£=0  j=0  J  ~ 


T-l 

-  I  I  a£  L1 

i=0  £+j=i  *  J  ~ 


because  L1  =  0  for  i  =  T,  T+l,  ...  if  6=1 
~  ~  u 


(3.54) 

(3.55) 


%  50  1 


I  a£ 

£=0  1  * 


i  =  1 ,  ...  ,  q-1  s 


(3-56) 


1=0 


“t  -  0  • 


i  =  q,  q+1. 


The  coefficients  6  6^, 

equation  (3.56)  with  q 


..  satisfy  the  homogeneous  linear  difference 
boundary  conditions  (3.5^)  and  (3.55).  Therefore 


(3.57: 


6i  =  I  k£  z£  ’ 
£=1 


1  “  0  j  1 j  •••  3 
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where  z^,  ...  ,  z^  are  the  roots  of  the  associated  polynomial  equation 


(3.58) 


I  «,  ^  -  0  , 


and  k  ,  ...  ,  k  are  determined  so  (3-57)  satisfies  the  boundary  con- 
Q. 

ditions  (3.5^)  and  (3-55).  [The  form  (3.58)  is  on  the  basis  that  the  q 
roots  are  different.]  Then  the  inverse  is 


(3.59) 


q  \  -1  oo  .  T-l 

l  L£  =  I  L1  =  l  6i  L1  . 
'=0  *  ~  i=0  i=0 


It  may  be  observed  that  (3-5^)5  (3.55)»  and  (3-58)  are  identical  to  (39) 


and  (1+0 )  of  Section  5.2  of  T.  W.  Anderson  (l97la)  with  3.  replaced  by  a. 

J  J 


and  p  replaced  by  q  .  Thus  the  coefficients  6Q,  6^, 


correspond  to 


the  moving  average  representation  of  an  autoregressive  process  with  coefficients 


1j  • • •  > 


(3.60) 


a  0\-l  .  T-l  ...  T-l-k 

l  a,  L*]  Lk  =  I  «.  L1+k  =  J 

i=0  1  ~ 


l  s-  ? 


because  L  =0  if  i+k  >  T  . 

~(i) 

T1  V)  rM  '  1 


The  coefficient  of  in  the  j-th  equation  of  {3-bk)  has  the  form 


(3.61) 


•£\~1 


f  a  L£  1  LJ  L'k  j  l  a  L 

1=0  £  ~  U=0  £  ~ 


T-l-j  T-l-k  .  , .  , 

=  tr  l  l  S  8.  LS+J  L  1+k 

*-*  u  rr  n 


g=0  i=0 


g  1  ~ 


j  ,  k  =  1. 


-h  '£ 


A  matrix  L  L  has  all  elements  0  except  along  the  diagonal  h  -  l 


entries  below  the  main  diagonal,  which  consists  of  l's  and  0's  .  In 
h  '  & 

particular ,  L  L  has  only  0's  on  the  main  diagonal  if  h  f  Z  ,  and 
T  h  T  *  h  ,  .  . 

L  n  has  Is  on  the  main  diagonal  except  for  that  first  h  entries 
being  0.  Hence 


28 


(3.62) 

(3.63) 

(3. 64) 


tr  Lh  l' £  =  0  , 


-h  T  'h 


,  Th  T 'h  _  _ 
tr  L  L  =  0 . 


h  ^  l  , 


tr  L  L  =  T-h  ,  h  =  0 ,  1 ,  ...  ,  T-l  , 


h  =  T,  T+l . 


Thus  (3.6l)  is 
(3.65) 


Note  that 
(3.66 ) 


?  A 1  ^  i’k( !  1 

U=o  *  ~  ~  u=o  *-  ~ 


T-l  -  max(j,k) 

l 


i=0 


[T-i-  max(j,k)J  | | Sx 


°  J;0  6i+|k-j|  6i  aAE^k  * 


where  anr,(k-j)  is  the  (k-j )-th  covariance  of  the  autoregressive  process 
Ati 

2 

corresponding  to  the  coefficients  1,  ...  ,  and  variance  <j  •  Thus 

2 

(3.65)  is  approximately  T  (J,D(k-j)/o  ,  especially  if  the  roots  of  (3-58) 

An 

are  small  and  thus  the  series  (3-66)  converges  rapidly.  In  particular 


T  I  *1  0\-l  .  IV  *1  I  n  \  0.R  (k— j  ) 

(3.67)  lim  |-tr[  cig  L£|  LJ  L  ct£  L  £  >  -  M 


T-xjo  T  \  1=0  1 


,  £=0 


There  are  various  ways  of  calculating  o^(h)  given  •••  ,  and  o ' 

[See  Section  5.2  of  T.  ¥.  Anderson  (1971a) ,  for  example.] 

The  equations  (3.44)  are  approximately 


(3.68) 


k=l 


j  1  >  *  •  *  5  Q.  5 


where 
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(3.69) 


d  .  =  i- 


<1 

I  ao~  "  r 


u=o 


~2 

ai-l 


l 


\-1 

t 

y 

1 

1=0 


a^1  l£\ 


a, 


(i-l)  t£\-1 


1=0 


T 


tr 


1=0 


S(i_1)  L£ 
an  £ 


,-i  ' 

Lg  i 

\ 


l  a!1-1'  l'A'1 


i,=0 


•••  3  Q. 


The  q  x  q  matrix  whose  elements  are  0^  "^(k-j)  are  the  covariances  of  an 


(i-l) 


autoregressive  process  of  order  q  ,  whose  coefficients  are  1,  >• ..5 


‘(i-l) 


a 


Then  the  solution  to  (3.68)  is 
(3.70) 


^  -  X  *,  • 


where  (f  )  =  [aBt,(k.-j)]  1  .  The  elements  f  are  the  coefficients  of  the 
kj  AR  kj 

quadratic  form  of  v..  ,  . . .  ,  v  having  a  normal  distribution  with  covariance 

a  q 

matrix  [cr^-1  ^  (k-j  )  ] .  The  matrix  is 


AR 


(3.71)  of 


i-l 


-(i-l) 

°i 


a, 


(i-l) 


s'1-1’ 

1  +  s'1-1’2 
s'1-1’  +  s'i-1)s'i-1> 


The  matrix  is  persymmetric ;  that  is,  it  is  symmetric  about  the  transverse 
diagonal.  If  q  is  odd  the  middle  term  is 


(3.72) 


-2 

ai-l 


1  +  a,1-1 ^  +  ...  +  aj1 


(q-i)/2 


The  matrix  is  essentially  derived  in  Section  6.2  of  T.  W.  Anderson  (1971) 
It  can  further  be  shown  that 

rn 

cos  A,i  cos  Ak 


(3.73)  lim 


^brl  l 

\l=0 


a£  L£1  LJ*  L 


"tt-T-  L‘ 


fix) 


dA 
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4 . _ Estimation  of  Coefficients  of  Linear  Transformations  When  a  Covariance 

Matrix  Has  Linear  Structure;  Autoregressive  Processes  'with  Moving  Average 
Residuals. 


Let 

(4.1) 

where  £,  u  =  0  and 

(4.2) 


XT 

l  3£  Kz  y =  tt  , 


1=0 


Gq,  Gi9  ...  ,  G^  are  q. 


^(u)  =  £uu’  =  l  a  G  , 

g=0  s  ~s 

+  1  known  linearly  independent  symmetric  TXT 


matrices  and  a0 >  >  • • •  »  are  ^  +  1  Paraffieters  such  that  lg=0  °g  w 

is  positive  definite.  Then  y  has  mean  vector  $  y  =  0  and  covariance  matrix 


(4.3) 


£(y)  = 


Bj,  H. 

'  1=0  36  ~x'/  g=0 


•r  j, 


“.Ml*.  5 


I 

lik 


i-l 


with  inverse 


(4.4) 


=  !  %  k3  ?  a  5 

k=0  K  \  g=0  6  6 


-1 


Z=0 


h  5* 


k  ,£=0 


^  h  5k 


4i 


r1 

°g  5g  h  • 


ife  assume  3q  =  1. 

If  u  is  normally  distributed,  then  2/N  times  the  logarithm  of  the 
likelihood  is 


(4.5)  |  log  L  =  -  T  log  2ir  +  2  log  °  -g  g. 


I  6  K  |  -  log  |  I  a_  G. 

Z=0  *  ~Xr  S=0 


-  tr  i  ek  5k  1  ^  ag  ~gj  c  ■ 

k,£=0  K  \  g=0  s  s' 


The  partial  derivatives  are 


;  ci 
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(4-6)  ik  =  %G 


q 

I 

lg=o 

p 


-i 


g  ~s 


P  /  q.  \-l  /  q 

*tr  I  ^  K  I  a  0  G  J  0  G 

k=0  ~k\g=o  g  ~g/  ~f|g=0  g  ~< 


l"1  p 

l  h  5*  c  > 


£-0 


f  =  0,  1,  ....  .  q  , 


(4.7) 


3  2 


y-i 


-1 


36£  1  l0g  L  =  2  trU0  3k  ^~2tr~X^^  l  GJ  Ko  > 


k=0  ~  ~“\g=0  g  ~g  I 

£  '=  1 ,  ...  ,  p  . 

In  case  &  =  Lk  ,  k  =  0,  1,  ...  ,  p  ,  w  =  1  ,  and  y(k)  =  Lk  y  ,  the 

~  ~  ~ 

derivative  equations  are 


^•8)  -( j0  ?g)_1  ?,  -  X  h  ?<«•( X  Sg  oj-1  Gf(j 


q 

o  G 
s0  g  ~g 


-1 


y 


(£) 


(M)  fy<*W? 

k=l  ~  \g=« 


.  a  G  J  y(£)  3  =  -  y(°)'[  f  $  G 

;=o  8  ~g;  ~  K  ~  \g£0  g  ~g 


The  second  partial  derivatives  of  (2/N)  log  L 


are 


(4.10) 


92  2  /  q  \— 1  q 

K  “logL  =  tr  X  i 


3  cl, 


f  =  0,  1,  ...  ,  q  , 

~V£)  • 

£  1>  ...  >p  • 

l-l 


g=0 


a  G  G, 

g  ~g  Jtl 


~  2  tr  l  ^ 

1 _ **■ 


k=0 


*k 


%  \ 

-1 

f  O  G 

G  „ 

\g=o  g^j 

~f 

q  \-i 

l  a  G 
t— n  g  ~g 


-1  p 


M  L  °g  ?g|  I  bp  k0  c , 


,g=0 


£=0 


(4.11) 


£  ~£ 

f,  h  =  0,  1,  ...  ,  q  , 

-1 


E 

!  a  ^ 

L-l  1 

(  q  \ 

=  2  tr  I  ^  \ 

k=0  K  ~K 

l  O  G 
ig=o  g  *j 

1  M 

|  1  a  G  ) 

i  g=o  g~gJ 

f  = 

=  0,1,  ... 

0.  , 

£  =  1,  . 

*A  S  , 
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(^. 12) 


-1 


3%  93£'  N 


log  L  =  -  2  tr  UK^ 


k=0  . 


-  2  tr  C  k£) 


g=0 


-1 


£,£'=!,  ...  ,  p 


The  information  matrix  has  elements  that  are  N  times 


<l,'13)  ~p  557357  it  log  L  =  \  tr 

f  h 


[-1 


i  g=o 


-1 


1  ?  d  0,  1,  ...  *  q  * 


N  l0g  L  =  "  tr 


\g=0 


o  G  |  K. 
g  ~g  ~£ 


q  \-l 

I  %  ** 

k=0  k  ~k 


f  -  0 }  1  j  •••  jQ.  ^  1 }  •••  jPs 


(^.15)  - 


3  1 


33£  33£<  K 


ios  l  '  ij0  ^  41  s(Jo  ^ 


+ 1 1 


1=0 


~£)  1  i 


y\-l  /  q  \-l 

B0  kJ  k’  , 

0  £~£|  ~r 


g 


g=0 

a,  £’  =  i,  ...  ,  p 


Let 


(U.16) 


£  _  f  o(i-D  „ 

-1-1 "  io  £  & 


The  method  of  scoring  leads  to  the  following  iterative  procedure: 


(U.l't)  l  tr  (S 
h=0 


IL'-1  5f  Cl'-1  5h  (si11  -  s'1-11) 

-  2  j,  2f  1  5*  L-i  Ib'1’  -  ei1-11) 


tr  (E?  )  1  G  +  tr  B*  )  1  G  (fU  )  1  B  C  , 

~i-l  ~f  ~i— 1  ~i-l  ~f  ~i-±  ~i-±  ~ 


f  =  0,  1, 


l  Q 
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(4.18) 

-2.itr?h  si 


h=0 


i(i)  £(i-l) 

0h  "  ah 


P 

2  l 

JlFl 


a — 1  T_  ✓s— 1  /NU  *  —1  ,  //'ll  \  —1 

tr  ?i-l  ?j  ?i-l  ?jl  +  tr  ~i-l  zi-l  ?i-l  ?£  ^ 


o(i )_fi(i-l) 
i  h 


,  Ay  _i 

2  tr  B /  K-  -  2  tr  C  B’  (z  )  K,  , 

AvX  —1—  A/  ’—z  3_  -1—  Aj  X  — 1 — 


j  1  9  •••  3  P 


These  equations  are  equivalent  to 

(4.19)  l  tr  (£"  )_1  G  (£*  )_1  G  o^x)  -  2  f  tr  G  (P  )_1  K  B"1 

,  Lrs  i-l  i  i-l  n  h  ~r  ~x,  -l-l 

h=0  JL==1 


■  -2  *  &  t-p-1  ♦  *  su  CP’1  o,  Cp"1  Si-i  s 


+  2  tr  Sf  &  5-i  • 


f  0,1,  ... 


(4.20)  -  2  l  tr  G  (Ej1  )  1  K.  B.1  C^1  ^ 
h=0  ~h  ~1_1  ~J  ~1-1  ~h 


+  2 


Sj=1 


tr  B.1  K  B-1-  K  +  tr  B.1  EU  B.’  |  KJ  (£U  )  1  K 

-l-l  ~j  -l-l  -i-l  -i-l  -.1-1  -l-l  ~j 


4  tr  B.  K  -  2  tr  C  B’  ( E.  )  1  K  -  2  tr  B.  t  -K  B.  t  K 

-i-l  -j  -  ~i-l  -i-l  -o  -i-l  -o  -i-l  -0 


/\_ 1  *  1 1  a,  I  ~  n  n 

-  2  tr  Si-i  Si-i  Si-i  ft  <Si-i>  5  • 


If  K.  =  LJ  ,  then  tr  B.1,  K.  =  tr  B.1.  =  0  . 

~j  -  -i-l  -i  -i-l  - 


j  —  1 ,  ...  ,  p 

Then  (4.20)  is 


l  tr  G  (EU  )  1  LJ  B  1 
h=0  ~h  ~x  1  ~  ~1_1 

2  f 

A _ "|  ^'U  At  _ ' 

tr  B.  -  E.  -  B.  : 

£=1 

-i-l  -i-l  -i- 

=  -  2 

tr  C  B!  (E!1  )-1  1/ 

-  -i-l  -i-l 

^i-i'  "  pJl 

2  tr  B.t  E  _  B.  ir  (E.  .  1  LJ  , 
-i-l  -i-l  -i-l  -i-l  -  ’ 

j  ~  1  5  •  •  •  »  P  » 


i  x»> 
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~(i)  f(i) 


:(i)  s(i) 


The  matrix  of  coefficients  of  ,  ...  ,  a  ,  3^  ,  .  .  .  ,  3. 


;(i) 


is 


(4.2 2) 


*  'iLi)'1  Sf  =h 


i-2  tr  5h  'Ifd'1  il-i 


-  2  tr  Gf  (E^)-1  B?^ 


2  tr  f"1  yu  b'-1  t ,£  fyu  \~i  TJ 
tr  ?i-l£i-l  ?i-l  i  {£i-l}  £  f 


If  C  =  yy’  ,  the  right-hand  side  of  (4.2l)  is 


(4.2  3) 


-  2  y)  1  y  , 


and  the  quadratic  form  on  the  right-hand  side  of  (4.19)  is 


(4-24) 


When  £U  is  to  represent  the  covariance  matrix  of  a  moving  average  process. 


50  =  l  > 


(4.25) 


G  =  Lg  +  L  6 
~g 


g  —  1 ,  ...  j  q  5 


and 


(4.26) 


2  qrg 

o  =  o  }  a.  a. ,  , 

g  jir-L  o  j+g 


g  1  5  •  •  •  5  Q.  • 


s  *  h 

Since  L  L  is  L°  ,  h  £  g  ,  except  for  at  most  h  l's  being,  replaced 
^11  <*■  *  .  *]  * 

by  0's  ,  Z4  and  %_i  ^  almost  commute  and  the  lower  right-hand  corner 
of  (4.22)  is  approximately 


2  tr  B.  IL  L0  B.  "L 

-i-l  -  -  -l-l 


(4.27) 
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5.  Estimation  of  Coefficients  of,  Linear  Transformation;  Autoregressive 
Processes  with  Moving  Average  Residuals 


Here  we  combine  Sections  2  and  3.  Let 


(5-D 


I  6£K£^“  I  ^  v  ’ 

&-= 0  x  k=0  ~ 


where  K_,  K  ,  ...  ,  K  are  p  +  1  known  linearly  independent  T  x  T 

/vJ_ 

matrices,  ,  . .  .  ,  are  q  +  1  known  linearly  independent  matrices, 

30  =  a0  =  1  5  5  • • *  j  j  ap  5  • • •  5  a  are  p  +  q  parameters ,  and  v 

is  a  T-component  random  vector  with  mean  vector  jg  v  =  0  and  covariance 

u  2 

matrix  C(v)  =  G  I  .  Then 


(5-2) 


y  =l !  e*  41  X 


,&=0  /  k=0 

has  mean  vector  0  and  covariance  matrix 


\  ~k  ~ 


(5-3)  Ay)  =  °|  f  %  k£ 


l&=0 


A  A’  B 


»here  A  =  §=0  op  Jk  »„d  B  =  g=Q  B*  K*  . 

If  y15  ...  ,  are  N  observations  on  y  with  a  normal  distribution, 
2/N  times  the  logarithm  of  the  likelihood  function  L  is 


(5.M  §-  log  L  =  -  T  log  27i  -  T  log  o2  +  2  log  |  ^  B£  K£|  -  2  log  |  \ 


£=  0 


-  tr  ~  I  0£,  B£  K 

a  £',i^0 


>■  a 


i  -it 


k=0 


a  j 
k  ~k 


a,  J '  |.  I  a  j 

-0  k  *1  kio  k  ■* 


,-l 


Kn  C  . 


The  partial  derivatives  are 
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(5'5)  3T  llog  L 


-  2  tr| 


Q.  \-l 

IJo^l  J- 


P  P  /  < 1  \-l  /  q.  \-l 

+  — tr  !  I  Vi  W'  IvJi  Ji 

a  £,£'=0  x  x,k=0  \k=0  ^  ~g 


q.  \-i 

I  %Jk 

\k=0 


—1  2  -1  ' 

=  -  2  tr  A  u  +  %  tr  A  BCB'A  J'A  , 

-  ~E  a2  .  ~g~ 


=  1, 


(5-6)  J|T  |l°g  L  -  2  tr 


jo b*  ^ 


-  7  Jo e*  s(i  **  4U  °* 


-1  2  '  -1  -1 

=  2  tr  B  K  -  —•  tr  B'  A  A 
~h  2  ~ 

O 


Kh  C  ,  h  »  1, 


/,-  3  2  _  T  1 

(5.7)  —  jlogL=  -  — +  -jj- 

3 a  a  a  £',£=0 


tr  }  p£*  e£  “k  Q  “k  £k|  ^£  S 


i-l 


T  1  -1  » 

- - 5-+ir-trA  BCB’A  . 

cr  a4  "  ~  ~  ~  ~ 

The  maximum  likelihood  estimates  are  defined  by  setting  the  derivatives  equal 
to  0 . 

The  second  partial  derivatives  of  (2/W)  log  L  are 


(5.8) 


|  log  L  =  2  tr  A  1  J  A  1  J  -  tr  A-1  J  A  1  B  C  B'  A*  1  J'  a'  -1 
™  ~g  ~  ~f  ~  ~f  ~  -  -  ~  ~  ~g  ~ 


3a  3a„  N 
g  f 


2  -1  '-1  '-1  '-1 

—■  tr  A  B  C  B'  A  J'A  J'  A 

a2 . f  ~  ~ 

2  -1  '-1  '-1  '-1 

%  tr  A  B  C  B'  A  J'A  J'A 

a2 . S  ~  ~f  ~ 


g,  f  =  1,  ...  ,  q 
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(5-9)  8ir4r  I  loS  L  =  \  tr  A"1  5,  C  B'  a'-1  a'"1 

g  h  a 

+  —  tr  A-1  J  A-1  K  C  B*  A*-1  , 
a2  ~  ~g  -  A  -  -  - 

g  =  l5  •••  5q5h  —  15  •••  9  P  a 

2 

(5-10)  srV  I log  L  = '  2  tr  r1  5j  r1  S,  -  ~2 tr  5j  a  f1  s,  e  > 

h  j  a 

1,  ...  jP  , 


(5.11) 


32  2 


8a  8a 


2  H  log  L  =  -  -|tr  A  1  B  C  B’  a'  1  J'  a'-1  ,  g  =  1,  ...  ,  q  , 


(5-12) 


p 

8  2 


8ft  8a 
h 


2  „  log  L  -  TT tr  i_1  h,  £  5’  i''1  • 


h— 1,  ...  jP  j 


(5.13) 


82  2 


8 (a2) 2  N 


T  2  -1  ' -1 

log  L  =  — 5-  -  -?  tr  A  B.  C  B1  A 


The  elements  of  the  information  matrix  are  N  times 


2 

(5-l>0  -  l  ,7  loS  L  =  tr  A"1  J„  A'1  J-f>  +  tr  A"1  J„  J*  a'"1  , 


8a  8a„  H 
g  f 


~g  ~  ~f 


g  If 

gjf  —  lj  ...  jQ.J 


(5-15)  -  l  f  log  L  -  -  tr  a'-1  A-1  Kb  B-1  A  -  tr  Jg  A"1  ^  B_1  , 

g  —  I,  •  •  •  s  Q.  5  h  —  1 9  •  •  •  5P5 

2 

(5.16)  -  g  ~  log  L  =  tr  B-1  K.  B_1  +  tr  K|  a'-1  A-1  B_1  A  A1  b’-1 

h-  J 


l5***3p5 
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(5. IT)  -  t - — 5T  ilog  L  =  i-tr  J*  a'-1  , 

3a  3a  *  a ^  ~s  ~ 


g  1,  ...  ,q_. 


(5-18) 


-t 


32  1 


(5.19)  - 


38h  3a 


32  1 


2  |  l°g  L  -  -  V  tr  Jh  S'"  ’ 


h  —  1,  ...  sp  , 


¥  loe  L  “  — r 

3(a^)2  N  2a4 


The  method  of  scoring  can  be  developed  from  these  results. 


If  J  =  K  =  Lg  ,  then  the  elements  of  the  information  matrix  are  N 

~g  ~g  ~  9 


(5-20)  -  1  ^log  L  =  tr  A-1  Lg  L'f  a'-1 


P  3a  3a_  N 
g  f 


g  ,  f  »  1, 


£ 


T.*1  "R-^"  &  T.  s  ft  — ■ 1 


(5.2!)  -  g  f  log  L  =  -  tr  A  ^  L“  B  A  L  &  A 


(5*22)  i  30.  3^  N 


g  1,  ...  ,q,h  1  > 

— 1  h  -1  '  _i  M  '  — 1 

log  L  =  tr  A  L  B  A  A'  B  L  J  A  . 

J  =  1, 


(5-23)  -  - j  i-  log  L  =  0  , 


3a  3a 

o 


g  =  1> 


(5.24)  -  £ 


30h  3a 


2  N 


?  log  L  =  0  , 


•••  j  p 


Note  that  Lg  ,  g-0,  1,  ...  ,  A,  B,  A  and  B  ^  are  polynomials 
in  L  and  hence  commute.  Thus  (5.2l)  and  (5.22)  are 


times 
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<5-25>  -  £  taV  I  108  L  -  -  tr  5 '  i  8  i  '  • 

g  h 


q  ,  h  =  1, 


(5-26) 


'  £ 


33  83.  N 

II  J 


1  '  M  h  -1 

log  L  =  tr  B  L  J  L  B  ,  h,  j 


=  1. 


When  J  =  K  =  LS  ,  then  the  method  of  scoring  involves  the  solution  of 
~g  ~g 


(5.2?) 


f=l 


/s_l  a  'f  ^'-1  ^(i) 

tr  A.X.  LS  L  A.  “■  a\  ' 

~i-l  ~  ~  ~i— 1  f 


h=l 


• s  * ' _i  h  ^-1  q  ( i ) 
tr  L  S  A.  ,  Ln  B.  3/ 
~i-l  ~  ~i-l  h 


ai-i 


^—1  s*.  s\  /\  1  _*1  *  rr  *  _*1  rr  a  •  _1  ^  *  _1 

tr  A.  ,  B.  ,  C  B!  ,  A.  ,  L  8  A.  ,  +  tr  A.  n  L8  (B.  7  -  A.  7) 

^1-1  ^1-1  ~  ~1-1  ~1— 1  ~  ~1— 1  ~I-1  ~  ^1-1  ~1-1 

6  ~  1 j  •••  j  q  , 


(5.28)  -  ^  tr  I?  b;1,  L,f  A_!-1 


f=l 


:i-l 


:i-i  af 


f  tr  b'  x  l'J  Lh  B71 
h=l  ~1~-L  ~  ~  ~i-l  h 


2 -  tr  Ai  x  LJ  C  B'  A  X  +  tr  (A.'“X  -  B.'_X)  I/5  B-1 

j-lj  •••  >P5 


ai-i 


/  _  _  \  ^2  1  — 1  A.  /S  A  *  _*] 

(5.29)  a.  =  5-tr  A.^  B.^  C  B-^  A._X  . 


If  N  1  5  yq  -  y  5  and  C  =  yy  ,  the  right-hand  sides  of  (5.27),  (5.28), 
and  (5.29)  are,  respectively. 


l+o 


(5.30) 


<5-1  ?i-l  I)'  h'-l  i  s  <Sil  ii-!  ;>  ♦  tr  A-  L*  (§: 


a.  , 

i-I 


“I  T  s  tit- 1 


1  T«  rfi'-l 


g  =  l. 


*  5  4.  3 


(5.31)  — —  ^-1  -  T  '"-1 


r-  (Cl  !i-l  V  (5-!  7)  +  tr  (*£  -  £!-)  B-tx  , 

0=1,. 


ai-l 


•  ,  P  , 


(5-32) 


1  ^-1 
T 


«-i!i-i?)  CiLi) 


i  >> 


1+1 


6.  Asymptotic  Theory 

The  exact  distributions  of  the  maximum  likelihood  estimates  developed 
in  this  paper  cannot  be  obtained  in  closed  form  in  general.  However,  asymptotic 
distributions  can  be  found.  If  N  00  we  have  the  case  of  repeated  observations 
on  the  random  vector  y  ;  in  the  case  of  time  series,  however,  N  may  be  1  and 
T  ■+  «>  .  In  either  case  when  consistent  estimates  of  the  parameters  are  used  as 
initial  estimates,  the  estimates  obtained  in  the  first  step  of  the  iteration  pro- 

I 

cedure  are  consistent,  asymptotically  normal,  and  asymptotically  efficient  (when 
normalized  by  y^T  or  ,  as  the  case  may  be). 

In  the  model  of  Section  2.1  no  iteration  is  involved  and  the  asymptotic 
properties  are  the  usual  ones  as  the  number  of  observations  N  increases.  The 
model  of  Section  2.2  is  the  autoregressive  model  with  the  first  p  observations 
treated  as  fixed  (y_^+1  =  •••  =  Jq  =  0)  ;  the  asymptotic  theory  as  T  -*■  °°  is 
well  known.  [See  T.  ¥.  Anderson  (1971) }  Section  for  example.] 

For  each  of  the  models  in  the  other  sections  [as  well  as  the  model 
I  =  0^  G-  treated  in  T.  ¥.  Anderson  (1971b),  (1973)]  an  iterative  procedure 

was  proposed.  If  the  initial  estimates  are  consistent,  the  matrix  of  coefficients 
of  the  linear  equations  is  a  consistent  estimate  of  the  information  matrix. of  one 
observation.  The  asymptotic  distribution  of  the  right-hand  sides  is  normal  with 
covariance  matrix  equal  to  this  matrix.  It  then  follows  that  the  estimates  have  the 
stated  properties .  ¥e  shall  carry  out  the  details  of  the  proof  only  for  the 
model  of  Section  3*2,  which  shows  the  pattern. 

Let  y  =  (y  ,  ...  ,  yT)  be  defined  by 

l  0L  Lk  v  =  A  v  . 
k=0  ~  ~  "  ~ 


(6.1) 


y 


We  shall  let  T  -*■  °°  .  We  assume  that  the  roots  of  (3.58)  are  less  than  1 
in  absolute  value.  Then  (3.44)  and  (3.45)  for  i  =  1  are 


(6.2)  l  tr  Aq  LS  L  J  A( 

j=l 


— 1  A  ( 1  )  1  A  '  -1  /'_1  a-  ^-1 

o  a-  j  ^  y*  Ao  A  * 

ao 


''-l 


2  i  "o  to  S do  l- tr  do  d  do 


’-i 


S  —  1  3  ...  5  0.  S 


(6.3) 


1  ^'—1 

ai=T?'  ^0  ^0  X 


We  shall  show  that 


(6.4) 


plim  i-tr  A"1  Lg  l'J  A*  1  =  lim  X  tr  A  1  Lg  l'J  A 

m  X  ~u  ~  ~  ~u  _  X  ~  ~  ~  ~ 

T-*50  T-xx) 


The  right-hand  side  is  given  by  (3.67).  The  left-hand  side  is 

(6.5) 


T-l  -  max(g,j) 
i=0 


i  +  max(g,,j) 
T 


JO  JO 

6i+|j-g|  6i  * 


where  6^  =  1  ,  <5V,  i  =  1,  ...  ,  constitute  the  solutions  to  (3-55)  and  (3.56) 

^0 

with  replaced  by  a0  ,  £  =  1 ,  . . .  ,  q  .  With  arbitrarily  high  probability 


£ 


''O 

0^,  ...  ,  are  such  that  the  roots  of  the  polynomial  equation  with  these 

coefficients  are  less  than  1  in  absolute  value,  in  fact,  are  less  than  .  p  <  1 
for  some  p  [greater  than  the  largest  root  of  (3-58)].  Then  (6.5 )  converges 
in  probability  to 

o'ar^s-']'  ) 


(6.6) 


L  1+  g-j  1 


i=0 


We  can  write  (6.2)  as 


43 


(6.7) 


0=1 


tr  Aq1  LS  l'J  A 


1 

v¥ 


SF  (ajl)  -  a  ) 


1  ^  1  -1  ''-I  s  ''-l 

i  r  io  i  &  x 

5o 


/s_1  n-  Al_-| 

tr  V  ±  r  ^0 


g  1  5  •  •  •  J  Q.  • 


We  want  to  show  that  the  right-hand  sides  have  a  limiting  normal  distribution 
with  means  0  and  covariance  matrix  (6.4). 

Consider 


(6.8)  y'A  1  A  1  Lg  A-1  y  =  — ~  y'  A  1  L®  v 


v'Tcr2 


-  I  «± 

•  .  r\ 


1  i  Tl+g 

v'  L  &  v 


i=0  1  /Fa2 

T-g-1  1  T- (i+g ) 

Jo  Ji  T*T««« 


For  any  n  the  set  (l//T)I,_  v^  vt+i »  •  •  •  » (l/^)r^_^  v^+n  have  a  limiting 
normal  distribution  [Theorem  7-7.6  of  T.  W.  Anderson  ( 1971a )3  for  example] 
with  means  0  and  covariances 


(6.9)  k  fr  l  v . 


T  r  L  vt  Vt+j  Vs  Ts+h  ~  T  £  ^  vt  Vt+j  vt+h 

Tj  9  S — _L  X>  — _L 


4 

=  o  , 


0  h  !>•••> 


=  0  , 


j  i  h  . 


Then  the  set 


(6.10) 


n~<l  -i  T 

l  s.  -i —  y  v  v 

i=0  1  v^o2  t=l  t  t+i+S  ’ 


g  1 ,  .  .  .  ,  q.  , 


has  a  limiting  normal  distribution  with  means  0  and  covariances 
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i 


1  n-q.  1  n  T 

(6.11)  —p  y  6.  6.  —  £  y  v,  -v.  .  v  v  , 

7  i,j=0  1  J  T  t,3-l  *  t+1+B  3  3+j+h 


n-q- jg-h | 
i=0 


5i  5i+ I g-h I  ’ 


which  has  the  limit  as  n  -v  oo  of  (6.6).  That  the  limiting  distribution  of 
(6.8)  is  the  limit  as  n  -*■  00  of  the  limiting  distribution  of  (6.10)  is  jus¬ 
tified  by  Corollary  J.J.l  of  T.  W.  Anderson  (1971a),  for  example.  Note  that 


(6.12)  j- 


i=n-q+l 


6. 

l 


T— (i+g) 

7^2  %  Vt 

v¥a  t=l 


Vt+i+g 


2 


i=n-q+l 


62 

1 


OO 

1  1 

i=n-q+l 


S2 

i 


Now  consider  the  difference  of  (6.8)  and  (6.7),  which  is 


(6.13)  — 

SF 


We  write 


-i-  v*  A  1  LS  v 


^*-1  /v_l 
A  A 

to  to 


S1 


A  v  + 


tr  Aq1  Lg  A' 


•’-1 

:o 


(6.1.4 )  Aq1  =  A-1  -  Aq1  (A0  -  A)  A-1  . 

Then  (6.13)  is 


^5 


(6.15)  —  i  ~  v»  A-1  LS  v 

]  a2  ~  -  -  ~ 


_1 

'2 


v ' A !  (A  *-1— A ' _1  (AQ-A )  '  Aq"1  )  ( A'1-^1  (Aq-A  )A_± )  Lg  (A-'L-A~J'  (Aq-A  )A~±  )Av 


,-1vtk,.-1  i-i, 


+  tr  (A  1-A~1  (Aq-A)A  1)LgA'(A,~1-A’  1(Aq-A)'Aq  1)S 


_1_ 


\[-i -  4-)  +  4- r  ^  Vi 

(\G  °ol  °0 


+  4"  v’Aq1(Aq-A)A  XLgv  +  ~  v’A"1!8^1^  -A)v  -  tr  A-1LS ( A  -A )  ' A '  1 

°o . ao . 

°0  °0 

°0 

+  T'  (A0-A )  'A^A"1  (Aq-A )A~1L®Aq1  (Aq— A)v 


/X _ -)  /\ 


+  tr  A'^Aq-AU-1!,8  (Aq-A)  'A0  ^  . 


The  first  term  on  the  right-hand  side  of  (6.15)  has  probability  limit  0  because 

*2  2 

(6.8)  has  a  limiting  normal  distribution  and  p  lim^^  c7q  =  o  >0.  Each  of  the 
third  and  fourth  terms  are 


(6-l6)  i  ii-j,  (^o!-v.  jj?sj  ^riE+i+J+ki 


°0  ^ 


°o  k=1 


Let 


(6.17) 


\v  =  l  6.  —  y-  Ln  J  t 
j  = 


hT  L  i  - - , 

i=0  J  /F 


Then 


(6.18) 


We  can  write 
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(6-i9> 


With 


arbitrarily  high  probability  6^  <  pj  for  some  Pq  such  that 


0  <  Pq  <  p  <  1  .  Then  the  square  of  (6.19)  is  less  than 


(6.20) 


I 

i=0 


6°  '  2 


-i.  V  2V 

ypj  i=0  Pl  g+k+i’T  ' 


4r°°  2  2  \ 

Since  the  expected  value  of  the  second  sum  is  less  than  a  Zj=o  ^j/(i-Pj); 
(6.20)  is  bounded  in  probability.  Since  p  lim^^  Ojj°  ^  =  a^.  ,  (6.l6)  has 
probability  limit  0  .  The  second  term  and  fifth  term  give 


(6.21)  ~  —  v'  (An-A)'  A^  1  A  1  Lg  v  -  tr  (Aq-A)'  i'Q  1  A-1  Lg 


a2  ~°~  ~° 


_  l 


^2 

Oq^1 


I  (a(0)-  a,  )  [”  J  S°  6,  -i  A-  L'k+V+W  tr  L,k+iL«+J  ) 
=1  *  *  i  ,j=0  1  J  /F  V  ~  ~  ~  ' 


+  —  l  ?  6.  (a2  -  a2)  tr  L’k+1  Ls+t5 

vFij=0  1  J  0 


The  sum  of  6.  times  the  first  parenthesis  is  treated  like  (6.17);  note  that 
<3 

the  parenthesis  has  mean  0  and  (6 .18)  as  a  bound  on  the  expected  value  of 

/v  2 

its  square.  The  same  argument  carries  through.  If  F’rXo  -<?q)  is  bounded  in 
probability  [or  ✓5r(a^.  -  a^.)  is],  then  the  second  term  converges  to  0  in 
probability.  The  other  terms  in  (6.15)  are  treated  similarly. 


It  follows  from  these  results  that  the  solutions  to  (6.7),  namely 
v/T(a]'  '-a^),  ...  ,  v'T^or  -Ct^)  have  a  limiting  normal  distribution  with 
means  0  and  a  covariance  matrix  that  is  the  inverse  of  the  information  matrix. 

The  sample  covariances  c^  defined  for  (2.48)  are  consistent  estimates  of 
o(h),  h  =  0,  1,  ...  ,  p+q  .  From  these  can  be  obtained  consistent  estimates  of 


&1»  •••  >  8  ,  °u(°)’  •••  ’  au(q)  and  of  g  ,  ...  ,  3. 


a. 


,  a 


q  ’ 


as  described  in  Section  5-8.1. 


and 
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ABSTRACT  j 

I  The  autoregressive  process  with  moving  average  residuals  is  a  station¬ 
ary  process  {yt}  satisfying  ]P_Q  3syt_s  =  Ij=0  a  v ,  where  the  j 

sequence  (v  }  consists  of  independently  identically  distributed  (unobserv-  > 

U 

able)  random  variables.  The  distribution  of  y^,  y^  can  be  approxi¬ 
mated  by  the  distribution  of  the  T-component  vector  y  satisfying  j 

2  s  ! 

F  jKy=JL  a.J.v,  where  v  has  covariance  matrix  o  I,K  =  J  =  Ls, 

Ls=0  s~st  ^ j=0  J~j~  ~  ~’~s  ~s  ~  ’  ij 

;  and  L  is  the  T  *  T  matrix  with  l's  immediately  below  the  main  diagonal  and 
0's  elsewhere.  Maximum  likelihood  estimates  are  obtained  when  v  has  a 

jj 

normal  distribution.  The  method  of  scoring  is  used  to  find  estimates  defined  ;j 
!  by  linear  equations  which  are  consistent,  asymptotically  normal,  and  asymp-  ! 

f 

jj  totically  efficient  (as  T  -*  °°).  Several  special  cases  are  treated.  It  is  jl 

I  s 

\  shown  how  to  calculate  the  estimates.  J 
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